Papers
Topics
Authors
Recent
2000 character limit reached

Stochastic Conjugate Frameworks for Nonconvex and Nonsmooth Optimization

Published 20 Oct 2023 in math.OC, cs.NA, and math.NA | (2310.13251v1)

Abstract: We introduce two new stochastic conjugate frameworks for a class of nonconvex and possibly also nonsmooth optimization problems. These frameworks are built upon Stochastic Recursive Gradient Algorithm (SARAH) and we thus refer to them as Acc-Prox-CG-SARAH and Acc-Prox-CG-SARAH-RS, respectively. They are efficiently accelerated, easy to implement, tune free and can be smoothly extended and modified. We devise a deterministic restart scheme for stochastic optimization and apply it in our second stochastic conjugate framework, which serves the key difference between the two approaches. In addition, we apply the ProbAbilistic Gradient Estimator (PAGE) and further develop a practical variant, denoted as Acc-Prox-CG-SARAH-ST, in order to reduce potential computational overhead. We provide comprehensive and rigorous convergence analysis for all three approaches and establish linear convergence rates for unconstrained minimization problem with nonconvex and nonsmooth objective functions. Experiments have demonstrated that Acc-Prox-CG-SARAH and Acc-Prox-CG-SARAH-RS both outperform state-of-art methods consistently and Acc-Prox-CG-SARAH-ST can as well achieve comparable convergence speed. In terms of theory and experiments, we verify the strong computational efficiency of the deterministic restart scheme in stochastic optimization methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM review 60(2), 223–311 (2018) Guo et al. [2020] Guo, P., Ye, Z., Xiao, K., Zhu, W.: Weighted aggregating stochastic gradient descent for parallel deep learning. IEEE Transactions on Knowledge and Data Engineering 34(10), 5037–5050 (2020) Ghadimi and Lan [2013] Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization 23(4), 2341–2368 (2013) Robbins and Monro [1951] Robbins, H., Monro, S.: A stochastic approximation method. The annals of mathematical statistics, 400–407 (1951) Roux et al. [2012] Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence _rate for finite training sets. Advances in neural information processing systems 25 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in neural information processing systems 27 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Guo, P., Ye, Z., Xiao, K., Zhu, W.: Weighted aggregating stochastic gradient descent for parallel deep learning. IEEE Transactions on Knowledge and Data Engineering 34(10), 5037–5050 (2020) Ghadimi and Lan [2013] Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization 23(4), 2341–2368 (2013) Robbins and Monro [1951] Robbins, H., Monro, S.: A stochastic approximation method. The annals of mathematical statistics, 400–407 (1951) Roux et al. [2012] Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence _rate for finite training sets. Advances in neural information processing systems 25 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in neural information processing systems 27 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization 23(4), 2341–2368 (2013) Robbins and Monro [1951] Robbins, H., Monro, S.: A stochastic approximation method. The annals of mathematical statistics, 400–407 (1951) Roux et al. [2012] Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence _rate for finite training sets. Advances in neural information processing systems 25 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in neural information processing systems 27 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Robbins, H., Monro, S.: A stochastic approximation method. The annals of mathematical statistics, 400–407 (1951) Roux et al. [2012] Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence _rate for finite training sets. Advances in neural information processing systems 25 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in neural information processing systems 27 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence _rate for finite training sets. Advances in neural information processing systems 25 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in neural information processing systems 27 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in neural information processing systems 27 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  2. Guo, P., Ye, Z., Xiao, K., Zhu, W.: Weighted aggregating stochastic gradient descent for parallel deep learning. IEEE Transactions on Knowledge and Data Engineering 34(10), 5037–5050 (2020) Ghadimi and Lan [2013] Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization 23(4), 2341–2368 (2013) Robbins and Monro [1951] Robbins, H., Monro, S.: A stochastic approximation method. The annals of mathematical statistics, 400–407 (1951) Roux et al. [2012] Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence _rate for finite training sets. Advances in neural information processing systems 25 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in neural information processing systems 27 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization 23(4), 2341–2368 (2013) Robbins and Monro [1951] Robbins, H., Monro, S.: A stochastic approximation method. The annals of mathematical statistics, 400–407 (1951) Roux et al. [2012] Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence _rate for finite training sets. Advances in neural information processing systems 25 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in neural information processing systems 27 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Robbins, H., Monro, S.: A stochastic approximation method. The annals of mathematical statistics, 400–407 (1951) Roux et al. [2012] Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence _rate for finite training sets. Advances in neural information processing systems 25 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in neural information processing systems 27 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence _rate for finite training sets. Advances in neural information processing systems 25 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in neural information processing systems 27 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in neural information processing systems 27 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  3. Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization 23(4), 2341–2368 (2013) Robbins and Monro [1951] Robbins, H., Monro, S.: A stochastic approximation method. The annals of mathematical statistics, 400–407 (1951) Roux et al. [2012] Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence _rate for finite training sets. Advances in neural information processing systems 25 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in neural information processing systems 27 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Robbins, H., Monro, S.: A stochastic approximation method. The annals of mathematical statistics, 400–407 (1951) Roux et al. [2012] Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence _rate for finite training sets. Advances in neural information processing systems 25 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in neural information processing systems 27 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence _rate for finite training sets. Advances in neural information processing systems 25 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in neural information processing systems 27 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in neural information processing systems 27 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  4. Robbins, H., Monro, S.: A stochastic approximation method. The annals of mathematical statistics, 400–407 (1951) Roux et al. [2012] Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence _rate for finite training sets. Advances in neural information processing systems 25 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in neural information processing systems 27 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence _rate for finite training sets. Advances in neural information processing systems 25 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in neural information processing systems 27 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in neural information processing systems 27 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  5. Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence _rate for finite training sets. Advances in neural information processing systems 25 (2012) Defazio et al. [2014] Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in neural information processing systems 27 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in neural information processing systems 27 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  6. Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in neural information processing systems 27 (2014) Johnson and Zhang [2013] Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  7. Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems 26 (2013) Konečnỳ and Richtárik [2017] Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  8. Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Frontiers in Applied Mathematics and Statistics 3, 9 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  9. Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017). PMLR Schmidt et al. [2017] Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  10. Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 83–112 (2017) Li and Li [2018] Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  11. Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in neural information processing systems 31 (2018) Pedregosa et al. [2017] Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  12. Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: A scalable parallel method for composite optimization. Advances in Neural Information Processing Systems 30 (2017) Pham et al. [2020] Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  13. Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. The Journal of Machine Learning Research 21(1), 4455–4502 (2020) Ghadimi et al. [2016] Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  14. Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1-2), 267–305 (2016) Liu et al. [2020] Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  15. Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optimization Letters 14, 2265–2283 (2020) Allen-Zhu [2017] Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  16. Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017) Zhou et al. [2018a] Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  17. Zhou, K., Shang, F., Cheng, J.: A simple stochastic variance reduced algorithm with fast convergence rates. In: International Conference on Machine Learning, pp. 5980–5989 (2018). PMLR Zhou et al. [2018b] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  18. Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Fang et al. [2018] Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  19. Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. Advances in neural information processing systems 31 (2018) Wang et al. [2019] Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  20. Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems 32 (2019) Hanzely et al. [2018] Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  21. Hanzely, F., Mishchenko, K., Richtárik, P.: Sega: Variance reduction via gradient sketching. Advances in Neural Information Processing Systems 31 (2018) Zhao and Zhang [2015] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  22. Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015). PMLR Murata and Suzuki [2017] Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  23. Murata, T., Suzuki, T.: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Advances in Neural Information Processing Systems 30 (2017) Lan and Zhou [2018] Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  24. Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Mathematical programming 171, 167–215 (2018) Allen-Zhu and Yuan [2016] Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  25. Allen-Zhu, Z., Yuan, Y.: Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International Conference on Machine Learning, pp. 1080–1089 (2016). PMLR Hien et al. [2019] Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  26. Hien, L.T.K., Nguyen, C.V., Xu, H., Lu, C., Feng, J.: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization. Journal of Optimization Theory and Applications 181, 541–566 (2019) Li et al. [2020] Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  27. Li, H., Fang, C., Lin, Z.: Accelerated first-order optimization algorithms for machine learning. Proceedings of the IEEE 108(11), 2067–2082 (2020) Chambolle et al. [2018] Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  28. Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization 28(4), 2783–2808 (2018) Reddi et al. [2016] Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  29. Reddi, S.J., Sra, S., Póczos, B., Smola, A.: Stochastic frank-wolfe methods for nonconvex optimization. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016). IEEE Liu et al. [2018] Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  30. Liu, J., Lin, L., Ren, H., Gu, M., Wang, J., Youn, G., Kim, J.-U.: Building neural network language model with pos-based negative sampling and stochastic conjugate gradient descent. Soft Computing 22, 6705–6717 (2018) Huang and Zhou [2015] Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  31. Huang, W., Zhou, H.-W.: Least-squares seismic inversion with stochastic conjugate gradient method. Journal of Earth Science 26, 463–470 (2015) Li et al. [2018] Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  32. Li, C., Huang, J., Li, Z., Wang, R.: Plane-wave least-squares reverse time migration with a preconditioned stochastic conjugate gradient method. Geophysics 83(1), 33–46 (2018) Wei et al. [2020] Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  33. Wei, Y., Zhao, M.-M., Hong, M., Zhao, M.-J., Lei, M.: Learned conjugate gradient descent network for massive mimo detection. IEEE Transactions on Signal Processing 68, 6336–6349 (2020) Jin et al. [2018] Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  34. Jin, X.-B., Zhang, X.-Y., Huang, K., Geng, G.-G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018) Xue et al. [2021] Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  35. Xue, W., Wan, P., Li, Q., Zhong, P., Yu, G., Tao, T.: An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Mathematics 6(2), 1515–1537 (2021) Kou and Yang [2022] Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  36. Kou, C., Yang, H.: A mini-batch stochastic conjugate gradient algorithm with variance reduction. Journal of Global Optimization, 1–17 (2022) Yang [2022a] Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  37. Yang, Z.: Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications 206, 117719 (2022) Yang [2022b] Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  38. Yang, Z.: Large-scale machine learning with fast and stable stochastic conjugate gradient. Computers & Industrial Engineering 173, 108656 (2022) Yang et al. [2018] Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  39. Yang, Z., Wang, C., Zhang, Z., Li, J.: Random barzilai–borwein step size for mini-batch algorithms. Engineering Applications of Artificial Intelligence 72, 124–135 (2018) Baydin et al. [2017] Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  40. Baydin, A.G., Cornish, R., Rubio, D.M., Schmidt, M., Wood, F.: Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782 (2017) Yang [2023] Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  41. Yang, Z.: Adaptive powerball stochastic conjugate gradient for large-scale learning. IEEE Transactions on Big Data (2023) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  42. Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Lei et al. [2017] Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  43. Lei, L., Ju, C., Chen, J., Jordan, M.I.: Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems 30 (2017) Nguyen et al. [2017] Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  44. Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261 (2017) J Reddi et al. [2016] J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  45. J Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in neural information processing systems 29 (2016) Zhou et al. [2019] Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  46. Zhou, P., Yuan, X.-T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019). PMLR Reddi et al. [2016] Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  47. Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016). PMLR Allen-Zhu and Hazan [2016] Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  48. Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016). PMLR Jorge and Stephen [2006] Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  49. Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, ??? (2006) Li et al. [2021] Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  50. Li, Z., Bao, H., Zhang, X., Richtárik, P.: Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In: International Conference on Machine Learning, pp. 6286–6295 (2021). PMLR Tran-Dinh et al. [2022] Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  51. Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Mathematical Programming 191(2), 1005–1071 (2022) Zhou et al. [2020] Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  52. Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research 21(1), 4130–4192 (2020) Nguyen et al. [2021] Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  53. Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact sarah algorithm for stochastic optimization. Optimization Methods and Software 36(1), 237–258 (2021) Wang et al. [2017] Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  54. Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization 27(2), 927–956 (2017) Zhou et al. [2018] Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  55. Zhou, D., Xu, P., Gu, Q.: Finding local minima via stochastic nested variance reduction. arXiv preprint arXiv:1806.08782 (2018) Nesterov [1983] Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  56. Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate o\\\backslash\bigl(k^2\\\backslash\bigr). In: Doklady Akademii Nauk, vol. 269, pp. 543–547 (1983). Russian Academy of Sciences Gilbert and Nocedal [1992] Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  57. Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on optimization 2(1), 21–42 (1992) Nesterov [2003] Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  58. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course vol. 87. Springer, ??? (2003) Agarwal and Bottou [2015] Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  59. Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: International Conference on Machine Learning, pp. 78–86 (2015). PMLR Kovalev et al. [2020] Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  60. Kovalev, D., Horváth, S., Richtárik, P.: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop. In: Algorithmic Learning Theory, pp. 451–467 (2020). PMLR Li et al. [2020] Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  61. Li, B., Ma, M., Giannakis, G.B.: On the convergence of sarah and beyond. In: International Conference on Artificial Intelligence and Statistics, pp. 223–233 (2020). PMLR Donoho and Johnstone [1994] Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  62. Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. biometrika 81(3), 425–455 (1994) Metel and Takeda [2019] Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  63. Metel, M., Takeda, A.: Simple stochastic gradient methods for non-smooth non-convex regularized optimization. In: International Conference on Machine Learning, pp. 4537–4545 (2019). PMLR Zhao et al. [2010] Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  64. Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288 (2010). IEEE Moreau [1962] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962) Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)
  65. Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.